Adaptive estimation of the transition density of 

a Markov Chain 



Claire Lacour 

Laboratoire MAP5, Universite Paris 5, 45, rue des Saints-Peres, 75270 Paris 

Cedex 06, France 
lacour @math-info. univ-paris5.fr 



Abstract 

In this paper a new estimator for the transition density tt of an homogeneous Markov 
chain is considered. We introduce an original contrast derived from regression frame- 
work and we use a model selection method to estimate tt under mild conditions. The 
resulting estimate is adaptive with an optimal rate of convergence over a large range 
of anisotropic Besov spaces B^^ ■ Some applications and simulations are also 
presented. 

Resume 

Dans cet article, on considere un nouvel estimateur de la densite de transition tt 
d'une chaine de Markov homogene. Pour cela, on introduit un contraste original issu 
de la theorie de la regression et on utilise une methode de selection de modeles pour 
estimer tt sous des conditions peu restrictives. L'estimateur obtenu est adaptatif et 
la vitesse de convergence est optimale pour une importante classe d'espaces de Besov 
anisotropes B^ 1 ^ . On presente egalement des applications et des simulations. 
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1 Introduction 



We consider (Xi) a homogeneous Markov chain. The purpose of this paper 
is to estimate the transition density of such a chain. This quantity allows 
to comprehend the form of dependence between variables and is defined by 
ir(x,y)dy = P(X i+1 G dy\Xi = x). It enables also to compute other quan- 
tities, like W[F(X i+ i)\Xi = x] for example. As many authors, we choose for 
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this a nonparametric approach. Roussas [T] first studies an estimator of the 
transition density of a Markov chain. He proves the consistency and the asymp- 
totic normality of a kernel estimator for chains satisfying a strong condition 
known as Doeblin's hypothesis. In Bosq [2], an estimator by projection is 
studied in a mixing framework and the consistence is also proved. Basu and 
Sahoo [3] establish a Berry-Essen inequality for a kernel estimator under an 
assumption introduced by Rosenblatt, weaker than the Doeblin's hypothesis. 
Athreya and Atuncar [4] improve the result of Roussas since they only need 
the Harris recurrence of the Markov chain. Other authors are interested in 
the estimation of the transition density in the non-stationary case: Doukhan 
and Ghindes [5] bound the integrated risk for any initial distribution. In [6], 
recursive estimators for a non-stationary Markov chain are described. More 
recently, Clemengon [7] computes the lower bound of the minimax LP risk and 
describes a quotient estimator using wavelets. Lacour [8] finds an estimator by 
projection with model selection that reaches the optimal rate of convergence. 

All these authors have estimated n by observing that ir — gj f where g is the 
density of (Xi,X i+ i) and / the stationary density. If g and / are estimators 
of g and /, then an estimator of n can be obtained by writing % = gj f . But 
this method has the drawback that the resulting rate of convergence depends 
on the regularity of /. And the stationary density / can be less regular than 
the transition density. 

The aim here is to find an estimator n of n from the observations X\, . . . , X n+ i 
such that the order of the L 2 risk depends only on the regularity of ir and is 
optimal. 

Clemengon [7] introduces an estimation procedure based on an analogy with 
the regression framework using the thresholding of wavelets coefficients for 
regular Markov chains. We propose in this paper an other method based on 
regression, which improves the rate and has the advantage to be really com- 
putable. Indeed, this method allows to reach the optimal rate of convergence, 
without the logarithmic loss obtained by Clemengon [7] and can be applied to 
/5-mixing Markov chains (the notion of "regular" Markov chains in [7] is equiv- 
alent to $-mixing and is then a stronger assumption). We use model selection 
via penalization as described in [9] with a new contrast inspired by the classi- 
cal regression contrast. To deal with the dependence we use auxiliary variables 
X* as in [10]. But contrary to most cases in such estimation procedure, our 
penalty does not contain any mixing term and is entirely computable. 

In addition, we consider transition densities belonging to anisotropic Besov 
spaces, i.e. with different regularities with respect to the two directions. Our 
projection spaces (piecewise polynomials, trigonometric polynomials or wave- 
lets) have different dimensions in the two directions and the procedure selects 
automatically both well fitted dimensions. A lower bound for the rate of con- 
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vergence on anisotropic Besov balls is proved, which shows that our estimation 
procedure is optimal in a minimax sense. 



The paper is organized as follows. First, we present the assumptions on the 
Markov chain and on the collections of models. We also give examples of chains 
and models. Section 3 is devoted to estimation procedure and the link with 
classical regression. The bound on the empirical risk is established in Section 
4 and the L 2 control is studied in Section 5. We compute both upper bound 
and lower bound for the mean integrated squared error. In Section 6, some 
simulation results are given. The proofs are gathered in the last section. 



2 Assumptions 

2.1 Assumptions on the Markov chain 

We consider an irreducible Markov chain (X n ) taking its values in the real 
line EL We suppose that (X n ) is positive recurrent, i.e. it admits a stationary 
probability measure fi (for more details, we refer to [H]). We assume that 
the distribution \x has a density / with respect to the Lebesgue measure and 
that the transition kernel P(x, A) = P(X i+ i e A\Xi = x) has also a density, 
denoted by ir. Since the number of observations is finite, 7r is estimated on a 
compact set A = A\ x A 2 only. More precisely, the Markov process is supposed 
to satisfy the following assumptions: 

Al. (X n ) is irreducible and positive recurrent. 

A2. The distribution of X is equal to /i , thus the chain is (strictly) stationary. 
A3. The transition density n is bounded on A, i.e. 

IKIloo := sup^)^ \ir{x,y)\ < oo 
A4. The stationary density / verifies \\f\\oo := sup a , gj4l \f(x)\ < oo and there 

exists a positive real fo such that, for all x in Ax, f(x) > fa. 
A5. The chain is geometrically /3-mixing (f3 q < e -79 ), or arithmetically /3- 

mixing {/3 q < q" 1 ). 

Since {Xj) is a stationary Markov chain, the /3-mixing is very explicit, the 
mixing coefficients can be written: 



where \\-\\tv is the total variation norm (see [T2]). 

Notice that we distinguish the sets A x and A 2 in this work because the two 
directions x and y in ti{x, y) do not play the same role, but in practice A\ and 
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A 2 will be equal and identical or close to the value domain of the chain. 



2.2 Examples of chains 



A lot of processes verify the previous assumptions, as (classical or more gen- 
eral) autoregressive processes, or diffusions. Here we give a nonexhaustive list 
of such chains. 



2.2.1 Diffusion processes 



We consider the process {Xi&)x<i< n where A > is the observation step and 
(X t )t>o is defined by 

dX t = b(X t )dt + a(X t )dW t 

where W is the standard Brownian motion, b is a locally bounded Borel func- 
tion and cr an uniformly continuous function. We suppose that the drift func- 
tion b and the diffusion coefficient a satisfy the following conditions, given in 
[13] (Proposition 1): 



(1) there exists A_, A + such that Vx ^ 0, < A_ < a 2 (x) < A + , 

(2) there exists M > 0, a > — 1 and r > such that 



V|x| > Mo, xb(x) < — r\x 



a+l 



Then, if X follows the stationary distribution, the discretized process (XjA)i<i< n 
satisfies Assumptions A1-A5. Note that the mixing is geometrical as soon as 
a > 0. The continuity of the transition density ensures that Assumption A3 
holds. Moreover, we can write 



/(*) 



1 



Mcr 2 (x 



exp 



o cr 2 {u] 



du 



with M such that // = 1. Consequently Assumption A4 is verified with 
exp [£- sup xgAi Jq b(u)du\ and f > j±- exp \£- inf xeAl / X b(u)du 



< _!_ 

00 — M\- 



2.2.2 Nonlinear AR(1) processes 
Let us consider the following process 

X n = (p(X n _i) + Ex n _ x ,n 

where e X)Tl has a positive density l x with respect to the Lebesgue measure, 
which does not depend on n. We suppose that the following conditions are 
verified: 



4 



(1) There exist M > and p < 1 such that, for all \x\ > M, \(f(x)\ < p\x\ 
and sup N < M \(p(x)\ < oo. 

(2) There exist l Q > 0, l x > such that Vx,y l < l x {v) < h- 

Then Mokkadem [14] proves that the chain is Harris recurrent and geometri- 
cally ergodic. It implies that Assumptions Al and A5 are satisfied. Moreover 
ir(x,y) = l x (y — f(x)) and f(y) = J f(x)ir(x,y)dx and then Assumptions 
A3-A4 hold with f > l and \\f\\oo < \\M\oo < h- 

2.2.3 ARX(1,1) models 

The nonlinear process ARX(1,1) is defined by 



where F is bounded and (^ n ), (Z n ) are independent sequences of i.i.d. random 
variables with E|£ n | < oo. We suppose that the distribution of Z n has a positive 
density / with respect to the Lebesgue mesure. Assume that there exist p < 1, 
a locally bounded and mesurable function /i:li-> R + such that Mh(Z n ) < oo 
and positive constants M, c such that 

V|(u, v)\> M \F(u, v)\ < p\u\ + h(v) — c and sup \F(x)\ < oo. 

\x\<M 

Then Doukhan [12] proves (p. 102) that (X n ) is a geometrically (3— mixing 
process. We can write 



where f$ is the density of £ n . So, if we assume furthermore that there exist 
a ,cti > such that a < f% < ai, then Assumptions A3-A4 are verified with 
fo > a and WfW^ < Wir]^ < 

2.2.4 ARCH processes 
The model is 



where F and G are continuous functions and for all x, G(x) ^ 0. We suppose 
that the distribution of e n has a positive density I with respect to the Lebesgue 
measure and that there exists s > 1 such that E|e n | s < 00. The chain (X n ) 
satisfies Assumptions Al and A5 if (see [T5]): 



A n — F(X n _i, Z n ) + ^ 



n 




A n+ i — F[X n ) + G(X n )e n+ i 



lim sup 

|x|— >oo 



\F(x)\ + \G(x)\(E\e n \ s )^ 



s 



< 1. 



(2) 



x 
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In addition, we assume that Wx Iq < l(x) < l±. Then Assumption A3 is 
verified with H^Hoo < l\j mi x& A x G{x). And Assumption A4 holds with fo > 
loffG- 1 and WfWoo^hJfG- 1 . 



2.3 Assumptions on the models 



In order to estimate 7r, we need to introduce a collection {S m ,m £ -M„} of 
spaces, that we call models. For each m = (m 1; m 2 ), S" m is a space of functions 
with support in A defined from two spaces: F mi and H m2 . F mi is a subspace 
of [L 2 fl L°°)(R) spanned by an orthonormal basis ((/>™)j g j m with | J m \ — D mi 
such that, for all j, the support of 92™ is included in A\. In the same way H m2 
is a subspace of [L 2 fl L°°)(R) spanned by an orthonormal basis (^™)fc6if m 
with \K m \ = D m2 such that, for all k, the support of 1])™ is included in A 2 . 
Here j and k are not necessarily integers, it can be couples of integers as in 
the case of a piecewise polynomial space. Then, we define 

S m = F mi ® H m2 = {t, t(x,y) = E E <k<P?(xW(y)} 

m 



The assumptions on the models are the following: 

Ml. For all m 2 , D„ l2 < n 1/3 and V n := max m6jM „ D mi < n 1 / 3 

M2. There exist positive reals <f>i,<f>2 such that, for all u in F mi , WuW 2 ^ < 

4>iD mi J u 2 , and for all v in H m2 , sup x6A2 |u(x)| 2 < 02-D m2 / v<2 - By letting 

0o = that leads to 

Vt £ S m Htlloo < 0oV^A^II*ll ( 3 ) 

where ||t|| 2 = J R 2t 2 (x,y)dxdy. 
M3. D mi < D m > =>- F mi C F m / and D m2 < D m / =>- H„ l2 C i7 m > 

The first assumption guarantees that dimS^ = D mi D m2 < n 2 ^ 3 < n where n 
is the number of observations. The condition M2 implies a useful link between 
the L 2 norm and the infinite norm. The third assumption ensures that, for 
m and m' in M. n , S m + S m > is included in a model (since S m + 5 m / C S m " 
with D m " = max(D mi , D m > ) and _D m » = max(.D m2 , D m ' )). We denote by S 
the space with maximal dimension among the (S m ) me Mn- Thus for all m in 

M n , S m C 5. 
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2.4 Examples of models 



We show here that Assumptions M1-M3 are not too restrictive. Indeed, they 
are verified for the spaces F mi (and H m2 ) spanned by the following bases (see 

my- 

• Trigonometric basis: for A = [0,1], < <po, ■ ■ ■ , <Pmi-i > with ipo = l[o,i], 
<f2j(x) = \[2 cos(2ixjx) t [0A] (x), (p 2 j-i{x) = v / 2sin(27rjx)l[ ,i](a;) for j > 1. 
For this model D mi = mi and fa = 2 hold. 

• Histogram basis: for A = [0, 1], < (p±, . . . ,^2 m i > with ^ = 2 mi / 2 l[(j_i)/ 2 m 1 j/2 m i[ 
for j = 1, . . . , 2 mi . Here D mi = 2 m \ fa = 1. 

• Regular piecewise polynomial basis: for A = [0,1], polynomials of degree 
0, . . . , r (where r is fixed) on each interval [(/ - 1)/2 D , 1/2% I = 1, . . . , 2 D . 
In this case, m 1 = (D,r), J m = {j = (l, d), 1 < I < 2 D ,0 < d < r}, 
D mi = (r + 1)2 D . We can put fa = \/r + 1. 

• Regular wavelet basis: < ^ik, I = —1, . . . ,mi,k G A(Z) > where ^-i^ points 
out the translates of the father wavelet and *&ik(x) = 2 l ' 2 ^/(2 l x — k) where 
\& is the mother wavelet. We assume that the support of the wavelets is 
included in Ai and that belongs to the Sobolev space 



3 Estimation procedure 

3. 1 Definition of the contrast 

To estimate the function ir, we define the contrast 

ln{t) = -E[/ t 2 (X i ,y)dy-2t(X h X i+1 )}. (4) 
n ~[ jr 

We choose this contrast because 

E 7n (t) = ||t-7T||5- ||7T||J 

where 

11*11 ? = / t 2 (x,y)f(x)dxdy. 
1 Jr 2 

Therefore 7„(t) is the empirical counterpart of the || . || /-distance between t and 
/ and the minimization of this contrast comes down to minimize \\t — 7r||/. 
This contrast is new but is actually connected with the one used in regression 
problems, as we will see in the next subsection. 

We want to estimate 7r by minimizing this contrast on S m . Let t(x,y) = 
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T,jeJ m 52keK m a j,k l f'T(. x )' l l ) T(.y) a function in S m . Then, if A m denotes the ma- 
trix (aj,k)j£j m ,k£K m , 



VjoVJfeo 5^ = 



where < 



j,leJ„ 



n 



i=l 



j£j m ,k£K 



Indeed, 

frv (t) 1 - 1 n 

= o ^ E = -Eff»). (5) 

uu jo,k j£j m u i=l ' i i=l 



We can not define a unique minimizer of the contrast 7 n (i), since G m is not 
necessarily invertible. For example, G m is not invertible if there exists jo in 
J m such that there is no observation in the support of ipj (G m has a null 
column). This phenomenon happens when localized bases (as histogram bases 
or piecewise polynomial bases) are used. However, the following proposition 
will enable us to define an estimator: 



Proposition 1 

VjoV&o |^ = & My {t{Xi,y)) x <i< n = P w ( ( E^W+i)C(2/) 



Ki<n/ 



where Pw denotes the orthogonal projection onW = {(t(Xi,y))i<i< n , t G S m }. 

Thus the minimization of 7„(i) leads to a unique vector (-7r m (Xj, y))i<i< n de- 
fined as the projection of (J2k V'fcPQ+i)V'fc(2/))i<i<n on W. The associated func- 
tion 7r m (., .) is not defined uniquely but we can choose a function 7r m in S m 
whose values at (Xi,y) are fixed according to Proposition [TJ For the sake of 
simplicity, we denote 

vr m = argmin7„(t). 

This underlying function is more a theoretical tool and the estimator is actu- 
ally the vector (7r m (X;, y))\<i< n . This remark leads to consider the risk defined 
with the empirical norm 

1/2 



-J2 t 2 (X u y)dy) . (6) 
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This norm is the natural distance in this problem and we can notice that if t 
is deterministic with support included in A\ x R 
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/o||*|| 2 < E||C = \\t\\j < \\f\U\t\ 
and then the mean of this empirical norm is equivalent to the L 2 norm ||.||. 

3.2 Link with classical regression 

Let us fix k in K m and let 

Y itk = W(X i+1 ) forie{l,...,ra} 

t k ( x ) = J t(x,y)^(y)dy for all t in L 2 (R 2 ). 

Actually, Y ijk and t k depend on m but we do not mention this for the sake of 
simplicity. For the same reason, we denote in this subsection by ip k and 
tp™ by (fij. Then, if t belongs to S m , 



m 

= E E (^J t k (x')pj(x')dx'^ <pj(x)i/; k (y) = E h{x)ijj k {y) 
and then, by replacing this expression of t in 7 n (t), we obtain 
1 n r 

ln{t) = - Y\ Y t k(X l )tkiX i )My)^(y)dy -2j2tk(X i )MXi+i)\ 
n i=i J k , k > k 

-in i n 

E E KW-^TOnid = -E £ [t k (x t )-Y hk ] 2 -Y 2 k 



71 i=l keK m U i=l k£K 



Consequently 



1 n 

min 7n (t)= £ min Y,MX t ) - Y hk } 2 - Y 2 k . 
tebm fce^ m fe6 mi n i=i 



We recognize, for all fc, the least squares contrast, which is used in regression 
problems. Here the regression function is 7r fc = / ir(.,y)ip k (y)dy which verifies 

Yi,k = ^k(Xi) +e i)k (7) 

where 



e^ k = tfj k (X i+1 ) - ¥*[%l) k (X i+ i)\Xi^ 
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The estimator 7t m can be written as J2k^K m ^k(x)2pk(y) where is the classical 
least squares estimator for the regression model ([71) (as previously, only the 
vector (7r fc (Xj)) 1 <j< n is uniquely defined). 

This regression model is used in Clemengon [7] to estimate the transition den- 
sity. In the same manner, we could use here the contrast J^'(t) = ^ SiLit^fct^i+i) - 
t(Xi)} 2 to take advantage of analogy with regression. This method allows to 
have a good estimation of the projection of n on some S m by estimating first 
each 7Tfc, but does not provide an adaptive method. Model selection requires a 
more global contrast, as described in (BJ. 



3. 3 Definition of the estimator 



We have then an estimator of it for all S m . Let now 

rh = arg mm {7„(vr m ) + pen(m)} 

m&Mn 

where pen is a penalty function to be specified later. Then we can define 
7f = 7T,ft and compute the empirical mean integrated squared error E||7r — 7r||^ 
where ||.|| n is the empirical norm defined in Q. 



4 Calculation of the risk 



For a function h and a subspace 5", let 

d(h,S) = mi\\h-g\\ = inf UJ \h(x,y) - g(x,y)\ 2 dxdy 



With an inequality of Talagrand [16], we can prove the following result. 

Theorem 2 We consider a Markov chain satisfying Assumptions A1-A5 (with 
7 > 14 in the case of an arithmetical mixing). We consider ir the estimator of 
the transition density tt described in Section^ with models verifying Assump- 
tions Ml -MS and the following penalty: 

pen(m) = KoM^^^ (9) 

where K is a numerical constant. Then 

E\\7rt A - k\\1 < C inf {d 2 (7rt Al S m )+pen(m)} + — 

m£M n n 
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where C = max(5||/|| 00 , 6) and C is a constant depending on 0i, 2 , IMloo, fo, 
II/IU7- 

The constant K in the penalty is purely numerical (we can choose K = 45). 
We observe that the term H^Hoo appears in the penalty although it is unknown. 
Nevertheless it can be replaced by any bound of H^Hoo. Moreover, it is possible 
to use 1 1 7T 1 1 oo where n is some estimator of ir. This method of random penalty 
(specifically with infinite norm) is successfully used in [17] and [18] for example, 
and can be applied here even if it means considering n regular enough. This 
is proved in appendix. 

It is relevant to notice that the penalty term does not contain any mixing term 
and is then entirely computable. It is in fact related to martingale properties 
of the underlying empirical processes. The constant K is a fixed universal 
numerical constant; for practical purposes, it is adjusted by simulations. 

We are now interested in the rate of convergence of the risk. We consider that 
7r restricted to A belongs to the anisotropic Besov space on A with regularity 
a = (ai,a 2 )- Note that if 7r belongs to ^"^(IR 2 ), then n restricted to A 
belongs to B^-^A). Let us recall the definition of B^-^A). Let t\ and e 2 be 
the canonical basis vectors in M 2 and for % — 1, 2, A r h i — {x G M?;x, x + 
hei, . . . ,x + rhti G A}. Next, for x in A r hi , let 



the rth difference operator with step h. For t > 0, the directional moduli of 
smoothness are given by 



^,i0>*) = SU P / r . \ A hM x )\ 2dx 

\h\<t. \JA U \ 



1/2 



\h\<t 

We say that g is in the Besov space .B^oo^) if 

2 

t>0 i=1 

for rj integers larger than a^. The transition density 7r can thus have differ- 
ent smoothness properties with respect to different directions. The procedure 
described here allows an adaptation of the approximation space to each direc- 
tional regularity. More precisely, if a 2 > «i for example, the estimator chooses 
a space of dimension D m2 = D^_/ a2 < D mi for the second direction, where n 
is more regular. We can thus write the following corollary. 

Corollary 3 We suppose that ir restricted to A belongs to the anisotropic 
Besov space B^^A) with regularity a = (a%, a 2 ) such that a% — 2a 2 + 2«!a 2 > 
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and «2 — 2ai + 2ai«2 > 0. We consider the spaces described in Subsection \2.4 
(with the regularity r of the polynomials and the wavelets larger than CX{ — 1). 
Then, under the assumptions of TheoremlB, 



~ 1 1 2 



0(n 2<5+2). 



n 



where a is the harmonic mean of a\ and a.i- 

The harmonic mean of a± and a 2 is the real a such that 2/ a — l/a\ + l/a.2- 
Note that the condition a± — 2a 2 + 2ctia 2 > is ensured as soon as ot\ > 1 
and the condition «2 — 2ai + 2a!a 2 > as soon as a 2 > 1. 

Thus we obtain the rate of convergence n 2<5 + 2 , which is optimal in the mini- 
max sense (see Section 5.3 for the lower bound). 



5 L 2 control 

5. 1 Estimation procedure 

Although the empirical norm is the more natural in this problem, we are 
interested in a L 2 control of the risk. For this, the estimation procedure must 
be modified. We truncate the previous estimator in the following way : 



5.2 Calculation of the L 2 risk 

We obtain in this framework a result similar to Theorem [2j 

Theorem 4 We consider a Markov chain satisfying Assumptions A1-A5 (with 
7 > 20 in the case of an arithmetical mixing). We consider it* the estimator 
of the transition density ir described in Section I5.il Then 




(10) 



with k n = n 



2/3 



E\\n* -nl A \\ 2 <C inf {rf 2 (vrl A , S m ) + pen(m)} + 



a 



n 



where C = max(36/ ||/||oo + 2, 36/ ) and C is a constant depending on 

01,02, IkHoo, |M|,/o, ||/||oo,7- 
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If 7T is regular, we can state the following corollary: 

Corollary 5 We suppose that the restriction of tt to A belongs to the aniso- 
tropic Besov space B 2 DL 00 (A) with regularity a = (ai, a 2 ) such that ot\ — 2a 2 + 
2a 1 a 2 > and a 2 — 2«i + 2«ia 2 > 0. We consider the spaces described in 
Subsection \2.4\ (with the regularity r of the polynomials and the wavelets larger 
than CKj — 1). Then, under the assumptions of Theorem^ 

E\\nl A - rf = 0(n"^T2). 
where a is the harmonic mean of a\ and a 2 . 

The same rate of convergence is then achieved with the L 2 norm instead of 
the empirical norm. And the procedure allows to adapt automatically the 
two dimensions of the projection spaces to the regularities a.\ and a 2 of the 

_ "2 

transition density tt. If a\ — 1 we recognize the rate n 3a 2 +1 established by 
Birge [19] with metrical arguments. The optimality is proved in the following 
subsection. 

If ax — a% — a ("classical" Besov space), then a = a and our result is thus 
an improvement of the one of Clemengon [7], whose procedure achieves only 

'2 a 

the rate (log(77,)/n) 2a + 2 and allows to use only wavelets. We can observe that 
in this case, the condition ai — 2a 2 + 1a\a 2 > is equivalent to a > 1/2 and 
so is verified if the function tt is regular enough. 

Actually, in the case ot\ = a 2 , an estimation with isotropic spaces (D mi = D m2 ) 
is preferable. Indeed, in this framework, the models are nested and so we can 
consider spaces with larger dimension (D^ < n instead of D 2 m < n 2 / 3 ). Then 
Corollary [3] is valid whatever a > 0. Moreover, for the arithmetic mixing, 
assumption 7 > 6 is sufficient. 



5.3 Lower bound 

We denote by ||.||a the norm in L 2 (A), i.e. ||(?||a = Ua M 2 ) 1 ^ 2 ■ We set 

B = {n transition density on R of a positive recurrent 
Markov chain such that ||7r||s« (>v) < L} 

and E T the expectation corresponding to the distribution of Xi, . . . , X n if the 
true transition density of the Markov chain is n and the initial distribution is 
the stationary distribution. 
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Theorem 6 There exists a positive constant C such that, ifn is large enough, 

inf supE w ||7r n — > Cn~^& 

where the infimum is taken over all estimators ir n of n based on the observa- 
tions X±, . . . , X n+ i. 

So the lower bound in [7] is generalized for the case a\ ^ a.2- It shows that 
our procedure reaches the optimal minimax rate, whatever the regularity of 
7T, without needing to know ex. 



6 Simulations 



To evaluate the performance of our method, we simulate a Markov chain with 
a known transition density and then we estimate this density and compare 
the two functions for different values of n. The estimation procedure is easy, 
we can decompose it in some steps: 

• find the coefficients matrix A m for each m = (m l7 m 2 ) 

• compute 7„(7r m ) = Tr(* A m G m A m - 2*Z m A m ) 

• find rh such that 7 n (^"m) + pen(m) is minimum 

• compute f} m 

For the first step, we use two different kinds of bases : the histogram bases 
and the trigonometric bases, as described in subsection 12.41 We renormalize 
these bases so that they are defined on the estimation domain A instead of 

[0, l] 2 . For the third step, we choose pen(m) = 0.5 — — — — . 

Tl 

We consider three Markov chains: 

• An autoregressive process defined by X n+1 = aX n + b + £ n +\, where the e n 
are i.i.d. centered Gaussian random variables with variance a 2 . The stationary 
distribution of this process is a Gaussian with mean b/ (1— a) and with variance 
cr 2 /(l — a 2 ). The transition density is ir(x,y) = (p(y — ax — b) where tp(z) = 
1/(<7v27t)- exp(— z 2 /2a 2 ) is the density of a standard Gaussian. Here we choose 
a = 0.5, b = 3, cr = 1 and we note this process AR(1). It is estimated on [4, 8] 2 . 

• A discrete radial Ornstein-Uhlenbeck process, i.e. the Euclidean norm of a 
vector (£ 1 ,£ 2 ,£ 3 ) whose components are i.i.d. processes satisfying, for j = 
1, 2, 3, = a£l + /3e J n where are i.i.d. standard Gaussian. This process is 
studied in detail in [2D]- Its transition density is 

f \ n i V 2 + a 2 x 2 axy y [Y 
n{x,y) = Voexp( )J 1/a ( ) ^_ 
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where Ji/ 2 is the Bessel function with index 1/2. The stationary density of 
this chain is f(x) = t x>0 exp{-x 2 /2p 2 }2x 2 /(p 3 V^) with p 2 = (3 2 /{l - a 2 ). 
We choose a = 0.5, f3 = 3 and we denote this process by V CIR since it is the 
square root of a Cox-Ingersoll-Ross process. The estimation domain is [2, 10] 2 . 
• An ARCH process defined by X n+ i = sm(X n ) + (cos(A n ) + 3)e n+ i where the 
e n+ i are i.i.d. standard Gaussian. We verify that the condition ([2]) is satisfied. 
Here the transition density is 




Fig. 1. Estimator (light surface) and true fonction (dark surface) for a V CIR process 
estimated with a histogram basis, n = 1000. 

We can illustrate the results by some figures. Figure 1 shows the surface z = 
ir(x, y) and the estimated surface z = n(x, y). We use a histogram basis and we 
see that the procedure chooses different dimensions on the abscissa and on the 
ordinate since the estimator is constant on rectangles instead of squares. Figure 
[2] presents sections of this kind of surfaces for the AR(1) process estimated with 
trigonometric bases. We can see the curves z = 7r(4.6, y) versus z = 7f (4.6, ?/) 
and the curves z = ir(x, 5) versus z = 7r(x, 5). The second section shows that 
it may exist some edge effects due to the mixed control of the two directions. 

For more precise results, empirical risk and L 2 risk are given respectively in 
Table Hand Table El 



15 



X 



4.6 



U 



Fig. 2. Sections for AR(1) process estimated with a trigonometric basis, n = 1000, 
dark line: true function, light line: estimator. 



n 

law 


50 


100 


250 


500 


1000 


basis 


AR(1) 


0.067 
0.096 


0.055 
0.081 


0.043 
0.063 


0.038 
0.054 


0.033 
0.045 


H 
T 


VCIR 


0.026 
0.019 


0.023 
0.015 


0.019 
0.009 


0.016 
0.007 


0.014 
0.006 


H 

T 


ARCH 


0.031 
0.020 


0.027 
0.012 


0.016 
0.008 


0.015 
0.007 


0.014 
0.007 


H 
T 



Table 1 

Empirical risk IE 1 1 7r — 7r||^ for simulated data with pen(m) = 0.5D mi D m2 /n, averaged 
over N = 200 samples. H: histogram basis, T: trigonometric basis. 



n 

law 


50 


100 


250 


500 


1000 


basis 


AR(1) 


0.242 
0.438 


0.189 
0.357 


0.132 
0.253 


0.109 
0.213 


0.085 
0.180 


H 
T 


VCIR 


0.152 
0.152 


0.130 
0.123 


0.094 
0.072 


0.066 
0.052 


0.054 
0.046 


H 
T 


ARCH 


0.367 
0.249 


0.303 
0.137 


0.168 
0.096 


0.156 
0.092 


0.144 
0.090 


H 
T 



Table 2 

L 2 risk Ell 7r — 7?* 1 1 for simulated data with pen(m) = 0.5D mi D m2 /n, averaged over 
N = 200 samples. H: histogram basis, T: trigonometric basis. 

We observe that the results are better when we consider the empirical norm. 
It was expectable, given that this norm is adapted to the studied problem. 



16 



n 

law ^^^^ 


50 


100 


250 


500 


1000 


basis 


Aix{l ) 


U.Uoz 


U.Uoo 


U.Uzd 


n non 
U.UzU 


U.Ulo 


TT 
11 




0.081 


0.069 


0.046 


0.037 


0.031 


T 


VCIR 


0.016 


0.014 


0.010 


0.006 


0.004 


H 




0.018 


0.012 


0.008 


0.006 


0.004 


T 



Table 3 

-2/ 



L (f(x)dxdy) risk E||7r — n*\\f for simulated data with pen(m) = 0.5D mi D m2 /n, 
averaged over N = 200 samples. H: histogram basis, T: trigonometric basis. 



Actually the better norm to evaluate the distance between tt and its estimator 
is the norm ||.||/. Table [3] shows that the errors in this case are very satisfactory. 

So the results are roughly good but we can not pretend that a basis among 
the others gives better results. We can then imagine a mixed strategy, i.e. a 
procedure which uses several kinds of bases and which can choose the best 
basis. These techniques are successfully used in a regression framework by 
Comte and Rozenholc [21], [22] . 



7 Proofs 



7. 1 Proof of Proposition U\ 



Equality (JHJ) yields, by multiplying by ipk^(y), 



Then, we sum over k in K m : 



n n 

£*(A,»i/)¥ETO = £ E v2(**fi)C(i/K(**). 

i=l i=l ko&Km 

If we multiply this equality by a', kip™(y) and if we sum over k e iT m and 
jo G J m , we obtain 
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n 

t=l k eKm k&K 

n 

i.e. E «^+i«(*( X ^) = 

1=1 fc()G-fCm 

for all u in S m . So the vector {t(X u y) - Efceif™ ^k n ( X i+i)i ) k'(y)h<i<n is or- 
thogonal to each vector in W. Since t(Xi,y) belongs to W, the proposition is 
proved. 



7.2 Proof of TheoremlB 



For p a real larger than 1, let 

Q p = {\/teS \\t\\ 2 f < p\\t\\ 2 n } 



Let 



In the case of an arithmetical mixing, since 7 > 14, there exists a real c such 
that 

1 

< c < - 
6 

7 

, 7C> 3 

We set in this case q n = | \rf\ . In the case of a geometrical mixing, we set 
<?n = ||_ cm ( n )J where c is a real larger than 7/37. 

For the sake of simplicity, we suppose that n = 4:p n q n , with p n an integer. Let 
for % = 1, . . . , n/2, Ui = X 2i ). 

A = (U2lq n +l,—,U^ +1 ) qn ) I = 0, . . . ,p n - 1, 

B t = (U(2l+X)q n+ l, U(2l+2)q n ) I = 0, . . . ,p n - 1. 

We use now the mixing assumption A5. As in Viennet [10J we can build a 
sequence (Af) such that 

A\ and A* have the same distribution, 
A\ and A*, are independent if I ^ I', 
P(At ? At) < f3 2qn . 

In the same way, we build (Bf) and we define for any I £ {0, . . . ,p n — 1}, 
A = (U2i gn +v-,UU l+1) J, Bf = (U( 2l+1)qn+1 ,...,U( 2l+2)qn ) so that the se- 
quence (C/j* , . . . , U*/ 2 ) and then the sequence (X^, . . . , X*) are well defined. 
Let now Vi = (X 2 i, X 2 i + ±) for i — 1, . . . , n/2 and 

Ci = (V2Z 9 „+1, V(2Z+l) gn ) Z = 0, . . . ,p n - 1, 
A= (V(2J+l)g n +l, V( 2 J+2)g n ) / = 0, . . . ,p n , - 1. 
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We can build (V**, V*J 2 ) and then (X? , . . . , X** +1 ) such that 

Ci and C** have the same distribution, 
C** and Cf are independent if / ^ I', 
P(Ci ± cr) < (3 2qn . 

We put X* +1 = X n+1 and X? = X 1 . Now let 

n* = {Vt Xi = X*=X**} and SI* = tt p n fi* 
We denote by n m the orthogonal projection of n on S m . Now, 



E\\n-nt A \\ 2 n = E 



7T 



vtI^^ ) +E(||7r-7rl A ||=lL 



(11) 



To bound the first term, we observe that for all s, t 

ln{t) - ln{s) = ||t - n\\ 2 n - ||s - irf n - 2Z n (t - s) 

where Zjt) = - £ X m ) - / t(X h y)n(X u y)dy\ . 

Since ||t — 7r||^ = ||i — Trillin + IK^-^lln) we can write 



ln{t) ~ 7«(«) = P " TTl. 



S-7Tl A ||2 -2Z n (t- S ). 



The definition of m gives, for some fixed m G M. n , 

7n(vr) + pen(m) < 7 n (vr m ) + pen(m) 

And then 

||tt - itt A \\n < ||vr m - tiIaWI + 2Z n {ix - 7r m ) + pen(m) - pen(m) 
< ||7r m - 7rlU||n + 2||7f - 7r m ||/ sup Z n (t) + pen(m) - pen(m) 

teB f (m) 

where, for all m', Bf(m') = {t G 5 m + SVn', ||t||/ = 1}. Let a real larger 
than 2p and p(., .) a function such that 9p(m, m') < pen(m) + pen(m'). Then 



|vr - Trt A \\ n l n * < \\ir r 







tt - 7i m \\ 2 f l n * + 2pen(m) 



m'&Mn 



sup Z 2 (t) — p(m,m') 

teBftm') 



(12) 



But ||7T - Tr m \\ 2 fU* < P\\* ~ KmWltni < 2 P \\* - TrlJ^ + 2p\\7rl A - 7T m \\ 2 n . 
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Then, inequality (fT2l) becomes 



'2f> 



e 



so E(||7f-7rl A ||^l n .)< 



9' 



2 P 



7f - 7rl A ||„ln; U - -jr < 1 + IT IK- ~ + 2pen(m) 



e 



+9 J2 SU P ^n(*) - P( m ' m ') In* 

m'£M„ L*G-B/( m ') J + 

fl + 2p „ ll2 20 . . 

■EUttI^ - 7r m \\ n + - — — pen(m) 



9-2p 



9-2p' 



Y E 
9-2p f-i. 

r m'eMn 



sup Z^(t) — p(m,m' 

t£B f (m') 



(13) 



J + 



We now use the following proposition: 



D(m,m') 

Proposition 7 Letp(m,m) = IOHtt^ where D(m,m ) denotes the 



n 



dimension of S m +S m i . Then, under the assumptions of Theorem^ there exists 
a constant C\ such that 



sup Z^(t) — p(m,m') 

teB f (m>) 



to* < 



n 



(14) 



Then, with 9 = 3p, inequalities ([TBI and (fl~4l) yield 

E (||7f - TrlAll^ln;) < 5||/|UI|7r m - vrl A f + 6pen(m) + ^ (15) 



ID [rn^ TYi) 

The penalty term pen(m) has to verify pen(m)+pen(m') > 30p||7r|' 



oo 

n 



i.e. 30p||7r|| oo dim(S' m + S m >) < pen(m) + pen(m') We choose p = 3/2 and so 

t \ /IK II II ^ m l^™2 

pen(m) = 45 7r Lq . 

n 

To bound the second term in (TTTT) . we recall (see Section[3j) that (7r A (Xj, y))i<i< n 
is the orthogonal projection of (Efc V ; fc(^i+i)^fc(y))i<i<n on 

W ={(*(*<, J/))i<i< n , teSrn} 

where ipk = Thus, since Pw denotes the orthogonal projection on W, 
using 0-® 
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(-Krh(Xi, y))l<i<n = Pw{(%2 ^k{Xi + i)^k{y))l<i<n) 

k 

= Pw((%2 Kk(Xi)i>k(y))l<i<n) +Pw((%2 e i,ki ) k{y))l<i<n) 
k k 

= P W (Tlt A (X i ,y)) 1 < i < n ) + Pw{(52£i,ki)k{y))i<i<n) 

k 

We denote by ||.||r« the Euclidean norm in M n , by X the vector (Xi)i<i< n and 
by e k the vector (s iik )i<i< n - Thus 

11^-^ = - / \\nt A {X,y)-P w (nt A {X,y))-P w (Y^e k My))\\i™dy 
n J k 

= - f \\Trl A (X,y)-P w (irl A (X,y))\\i n dy+- [ \\Pw(£e k My))\\l-dy 

< - j \\irt A (X,y)\\l n dy + - f \\^e h tl) k (y)\&ndy 

1 n r 1 n r 

< - IKIloo / *( X i> V) d V + _ E / E £ i,k^k{y)fdy 

n i=l J n i=l J k 

1 n 

< IklU + - EE 

But Assumption M2 implies || Y,keK A ^fclloo < 02-Dm 2 - So, using (jHJ), 

e? fc < 2^ 2 k (X l+1 ) + 2E[^(X m )|X,] 2 
and E e ?,fc^ 2 E^(^+i) + 2E E^(^+i)l^] < 4 ^An 2 

Thus we obtain 

||tt1 a - TT rh \\l < HttHoo + 40 2j D A2 < UttIU + 40 2 n 1/3 (16) 
and, by taking the expectation, E (j|7rlU - #m|| 2 ln; c ) < (IKHoo+402^ 1/3 )-P(^p C )- 

We now remark that P(tt* c ) = P(il* c ) + P{Vt c p n Q*). In the geometric case 
p2g n < e" 7cln ( n ) < n~^ c and in the other case (3 2qn < (2q n )^ < n^ c . Then 

P(W C ) < 4 Pn f3 2qn < n 1 -^. 

But we have choosed c such that cy > 7/3 and so P(f2* c ) < n~ 4 / 3 . Now we 
will use the following proposition: 
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Proposition 8 Let p > 1. Then, under the assumptions of Theorem [H or 

_ C 2 
Theorem]]} there exists C 2 > swc/i i/iai P(fi£ fl Q*) < — 



This proposition implies that E (||7t1a — #m||„ln; c j < 



?> 1 
n 



Now we use (1151) and we observe that this inequality holds for all m in A4 n , 
so 

C 

E\\n - nt A \\ 2 n < C inf (||tt1 a - vr m || 2 + pen(m)) + — 

m&Mn n 

with C = max(5||/|| 00 , 6). 



7.3 Proof of Corollary \E 



To control the bias term, we use the following lemma 

Lemma 9 Let tta belong to B^^A). We consider that S' m is one of the fol- 
lowing spaces on A: 

• a space of piecewise polynomials of degrees bounded by Si > a« — 1 (i — 1,2) 
based on a partition with rectangles of vertices 1/D mi and 1/D m2 , 

• a linear span of {(f)\ijj^, A G U^' 1 A(j),yU G U™ 2 M(/c)} where {<f)\} and {VVt} 
are orthonormal wavelet bases of respective regularities Si > a± — 1 and 
s 2 >a 2 -l (here D mi = 2 m >, 1 = 1, 2), 

• the space of trigonometric polynomials with degree smaller than D mi in the 
first direction and smaller than D m2 in the second direction. 

Let Ti' m be the orthogonal projection of it a on S' m . Then, there exists a positive 
constant Cq such that 

Proof: It is proved in [23] for S' m a space of wavelets or polynomials and in 
[24] (p. 191 and 200) for a space of trigonometric polynomials that 

The definition of B^(A) implies (f A \tt a - <J 2 ) 1/2 < C [D~? + D m f}. □ 

If we choose for S' m the set of the restrictions to A of the functions of S m 
and n a the restriction of n to A, we can apply Lemma El But n' m is also the 
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restriction to A of ir m so that 

\\*U-*m\\<C [D^+D^]. 
According to Theorem [2] 

2 <- r"' i^f / n-2oi i D-2Q2 , D mi D m 2 



Ellfr -Trl.llf < C" inf ^ -D" 1 + JD, 



In particular, if m* is such that -D m * = [n^i+^+^i^ J and -D m * = |_(An*) Q2 J 
then 



Tt A \\ 2 n <C"'\D^ + -^ 

But the harmonic mean of «i and a 2 is a = 2a 1 a 2 /(a 1 + a 2 ). Then E||-7r — 

2a 

ixt A \\l = 0{n-^). 

The condition D mi < n 1 ^ 3 allows this choice of m only if ai+a ^ 2ai a 2 < I *' e ' ^ 
a.\ — 2a 2 + 2a!a 2 > 0. In the same manner, the condition a 2 — 2cti + 2a\a 2 > 
must be verified. 



7.4 Proof of Theorem^ 

We use the same notations as for the proof of Theorem El Let us write 

EIItt* - ttIJI 2 = B 1 + B 2 + B 3 



with < 



B x = E(\\r - nl A \\ 2 t n; t M < kri 
B 2 = E(\\r -Trl A \\ 2 l n; t M>knj 
B 3 = E (\\r -7il A \\ 2 l nr 



To bound the first term, we observe that for all m G M n , on Q* \\n — n m \\ 2 < 
/(TVIItt - Tr m \\ 2 n . Then 



||tt - nl A \\ 2 l n; < 2\\tt - 7i m \\ 2 l n; + 2\\ir m - irt A \\ 2 

< 2/ - 1 p||7T - 7T m ||2l n; + 2||7T m - 7Tt A \\ 2 

< 2f^p{2\\n - 7rl A || 2 l Q; + 2\\n m - vrl A || 2 } + 2\\n m - nl A \\ 2 

Thus 

B± <E (\\fr - 7rl A || 2 l n .) < 4/ -VE (Utt - 7rl A || 2 l-) + (4/ - 1 p||/l|oo+2)||7r m -7rl A || 



2 
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But, using (fT5l) . we obtain 

B l < (24/ - 1 p||/Hoo + 2)||7r m -7rl A || 2 + 24/ ( 7 1 ppen(m)+36/ -y^. 
Since p = 3/2, by setting C = max(36/ _1 ||/|| oo + l,36/ -1 ), 

B l < C{||7r m - tt1 a || 2 + pen(m)} + ^A^l 

n 

for all m G M. n . 

Next, the definition of n* and the Markov inequality provide 

B 2 < E (||7rl A || 2 l n »l|| ff || >fen ) < Hvrf " *»> . (17) 

But \\n\\ 2 l n * < P f^\\m < 2pf \\\n - vrlj 2 + ||7rl A || 2 ). Now we use (EES) 
to state 

||7f || 2 ln; < 2p/ - 1 (||7r|| oo + 4M 1 / 3 + ||7rl A || 2 ) 

in . 

< 2p/ - 1 (||7r|| oo + 40 2 n 1/3 + - ^ Hoc / Trpfi, y)ds/] 
<2p/ o - 1 (2||7r|| oo + 40 2 n 1 / 3 ). 
Then, since k n = n 2 / 3 , (fT7|) becomes 

R l , 2 2p/ o - 1 (2|| 7 ri| oo + 40 2 n 1 / 3 ) , ||2 /IMU 2<M 

Lastly 

~* II 2 i \\„t \\2\-fi \ ^ n{J„2 i ||„||2 



B 3 < E (2(||r || 2 + ||7rl A || 2 )l n;c j < 2(ki + |M| 2 )P(^). 

We now remark that P(Q* C ) = P(Q* C ) + P(£l c p H fi*). In the geometric case 
/5 2g „ < e" 7cln ( n ) < n"^ and in the other case (3 2qn < (2q n )^ < n~~< c . Then 

P(n* c ) < 4 Pn f3 2qn < n 1 ^. 

10 , 

But, if 7 > 20 in the arithmetic case, we can choose c such that 07 > — and 

3 

so P(Cl* c ) < n~ 7 / 3 . Then, using Proposition [HJ 

4/3 , ,i_ii2^ + C 2 ^ 2(C 2 + 1)(1 + |M' 2 



53<2(n 4 / 3 + || 7 r|| 2 )^ 7 ^< 
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7. 5 Proof of Theorem 



Let if; be a very regular wavelet with compact support. For J = (ji, J2) G Z, 2 
to be chosen below and K = (ki, k 2 ) G Z 2 , we set 

ip JK (x, y) = 2 {jl+j2)/2 ip(2 jl x - h)ip(2 j2 y - k 2 ). 

Let vto(x, y) = cq1b{v) with £? a compact set such that A C B x B and \B\ > 
2\A\ 1 / 2 /L 1 and c = So 7r is a transition density with ||7To||b« (a) < £/2. 

Now we set i?j the maximal subset of Z 2 such that 

Supp(fe) C A VJi G Supp(fe) n Supp(fe') = if K ± K'. 

The cardinal of Rj is \Rj\ = c2 J1+j2 , with c a positive constant which depends 
only on A and the support of if). Let, for all s = (ek) G {—1, l}'^- 7 ', 

7T £ = TTO + -7= J! £ K^JK- 
V 71 KeRj 

Let us denote by £ the set of all such 7r e . Since f if> = and 7r is a transition 
density, for all x in R, / n £ (x, y)ofa/ = 1. Additionally vr £ (x, ?/) = 7r (x, y) > if 
(a;, j/) £ A, and if (2, y) G A tt £ > c - 2 (il+j2)/2 ||^|| 2 <3 /v^ and then vr £ (a;, > 
Co/2 > as soon as 

< ^%r- (is) 



n 



00 



Thus, if (fT8l) holds, n £ (x,y) > (co/2)1b(t/) for all y. It implies that the 
underlying Markov chain is Doeblin recurrent and then positive recurrent. We 
verify that / = Cq\b is the stationary density. To prove that n e G B, we still 
have to compute ||7r E ||s a (A)- Hochmuth |23| proves that for if> smooth enough 

\\EKeRj£K4jK\\B*<A)'< (2 jiai + 2 J ' 2Q!2 )|| EkzRj £ki/>jk\\a. Since 



E e*l>JKf A = E \eK\ 2 = d& + i 

KeRj KdRj 



then 

\ M \ B! ^<l + 2 —^-^2^. 



From now on, we suppose that Condition C is verified where 

(oh^i _j_ 2i2 a 2^2(ii+i2)/2 £ 

Condition C: = < — 77^. 

>n 2c v l l 



It entails in particular that (TTHl) holds if ji and j 2 are great enough. Then for 
all e, 7r e G £>. We now use the Lemma 10.2 p. 160 in Hardle et al. [25]. The 



25 



likelihood ratio can be written 



a ( — \ TT ^AXi, X i+ i) 



__ x 7T £ (X i7 X i+1 ) 



Note that 7r £ (Xj,X i+ i) > P ne - and P ne - almost surely (actually the chain 
"lives" on B). Then 



log(A n (7T £tX , 7T e )) = J2 lo § 1 



2 exipjiciXi, X i+ i] 



i=1 \ V™ 7T £ (X u X i+1 ) J 
We set U JK (Xi,X i+1 ) = -£ K ^j K {Xi,X i+l )/Ti £ (X h X i+1 ) so that 



log(A n (vr £iX , 7r e )) = £ log 1 + -=U JK (Xi, X l+1 



i=l 



X! i M —/=UjK(Xi, X i+ i) J H — -=UjK{Xi, X i+ i) Uj K (Xi, Xi+i 

i—i \ \\/Ti I \/n n 



i=l 



u n + v n - w n 



u 



with 9 the function defined by 9{u) = log(l + u) — u + — . Now we prove the 
three following assertions 



1° E^Kl) = E^ £ ( ELi^ 



2° E^K) = E^ -Etr %r(*i> < 4 



n 



3° E 7re (^) = E ffe ( -| Y%=iU JK {X h X i+1 )\ 2 \ < 8 



1° : First we observe that 



2 

—Ujk 
n 



n 



2 2^ 1+ ^/ 2 ||^||^ /20'i+^)/ 2N 

y/n c /2 \ 
2O1+J2) 

and ► since Condition C holds. So there exists some integer n such 

n 

that Vn > n , Vz,y, |^(2^(x, y)/y^)| < |2f//^(x, y)/^ 3 - But 



2U JK {x,y) 



f(x)n £ (x,y)dxdy 



\^ JK {x,y)\ 3 

—}{x)dxdy 



< 



g 2( J1+J2 )/ 2 



2 

00 c o 



nwn 



[com* 



i>jK(x,y) dxdy < 



n^JnJJ ir £ (x,y) 
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2 /oCh+i2)\ 
00 / z \ 



c n 



n 



n 32 

Then E^KI 



2 [2<<h+i2)\ 1/2 



0. 



i=l 



c n 



n 
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2° : We bound the expectation of UjxiXi, X i+ i) 2 : 

KAUMXuX^) = I! ^ff(x)dxdy < c If ^- V) 

J J n £ (x,y) JJa c /2 

And then E^K) = ((2/n) E?=i U JK {X h X i+1 f) < 4. 



dxdy < 2. 
(19) 



3° : We observe that E^Uj^, X i+1 )\X u . . . , X,) = and thus £? =1 X m ; 
is a martingale. A classic property of square integrable martingales involves 



E 1Ve 



ujK^Xi, x i+ i] 



,i=i 



Thus, using PJ, E^CO = (4/n) E"=i [U JK {X i: X l+l ) 2 } < 8. 

We deduce easily from the three previous assertions 1°, 2° and 3° that there 
exists A > and p such that P n£ (A n (7r etK , ir e ) > e~ A ) > pq. Thus, according 
to Lemma 10.2 in [25], 



TT7 II " l|2 \ l^A c-2 —A 

maxE^II^ - 7r e || A > ——d e p 
t e £5 A 



where 5 = inf e ^ e / ||7r E - n e/ \\ A /2 = \\e k ^jk/ ^/n\\ A = l/y/n. 
Now for all n we choose J = J(n) = (ji(ra),j 2 (?7,)) such that 

22 21 

ci/2 < 2 n n "i+"2+2"i«2 < d and C2/2 < 2 n n a i+«2+2"i«2 < C2 

with Ci and c 2 such that (c" 1 + C2 2 )y / Cic 2 ~ < L/(2c 1 ^ 2 ) so that Condition C is 
satisfied. Moreover, we have 



2 CC1C2 a 2+°>l 1 CC1C2 - 2a l a 2 

1 J| - 4 - 4 



Thus 



_ ,. A ll2 Ce Po c l c 2 ' 2 °i a 2 

maxE T 7r n — 7r e 4 > na i+ a 2+2ai a 2 , 

7r £ ee 11/1 8 



And then for all estimator 



supE^I^ - tt\\ 2 a > Cn 2 -+ 2 



with C = ce x poCiC2/8. 
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7. 6 Proof of Proposition [7| 



Ti(t) = t(X t ,X l+1 ) - ft(X i ,y)n(X i ,y)dy, 
Let T*(t) = t(X*,X* +1 ) -ft(X*,y)ir(X*,y)dy, 

T**(t) = t{X*\Xr +l ) -ft(X*%y)Tr(Xr,y)dy. 

We now define Z* n {t): 

K(t) = -Y,m) + - E irw- 

2 odd r even 

Let us remark that Z*(t)ln* = Z n (t)l n *. Next we split each of these terms : 



p„-l 2(2Z+l)g n -l 



p n -l 2(2l+2)q n -l 



3W*) = -E E r*W, 3U0 = -E E 



Z=0 i=4lq n +l,i odd 



Z=0 i=2(2/+l)g n +l,j odd 



p n -l 2{2l+l)q n 



p n -l 2(2l+2)q n 



z Ut) = -H E rm = -E 



i=0 i=4ig„+2,i even 

We use the following lemma: 



n 



E 



1=0 i=2(2l+l)q n +2,i even 



Lemma 10 (Talagrand 

Let Wo, • • • Mn-i i-i-d. variables and (Ct)teB « set of functions. 

j JV-l 

Lei G(t) = — E (tipti)- We suppose that 



1=0 



(1) supllOlU < Mi, ^ E(sup|G(*)|) < (3) supV ar[( t (U )} < v. 

teB teB t<=B 

Then, there exists K > 0, K\ > 0, K 2 > such that 



E 



supG 2 (t) - 10# 2 

tG-B 



< K 



_ Ki em1 Ml Ki 



Nil 
Mi 



HereiV = p n , 5 = B f (rri) and for / G {0, . . . ,p n -l}, U x = (X 4 % n+1 , .., X* (2/+1)( J, 

■y 2q n -l 



Then 



1 Pn-1 1 2(2i+l)?n-l 

c r (*) = -E- E i7(*) = 4^ ll (t). 

; =0 <7n i=4Jg n +l ) i odd 
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We now compute Mi, H and v. 



(l)We recall that S m + S m > is included in the model S m » with dimension 

max(D mi , D m /) max(D m2 , An!,)- 



sup||Ct||oo < supii^Hoo — E l 1 + / n ( x i^y) d y 

teB teB Qn i=hiodd V J 



< 



2(/) Jmax(D mi ,D m >)max(D m2 ,D m/ )\\t\\ < 
v Jo 



n 



1/3 



20o , ,„ 

Then we set Mi = — — n ' 6 . 

Jo 



(2) Since A and A* Q have the same distribution, Ct(^o) = — £i=i,i odd 
has the same distribution than — Z)i=i^ dd ^(i)- We observe that E(Tj(t)|Xj) 
and then for all set / 



E 



E r «(*) 



ie/ 



e ( ^ r^r^t) 



2E ^E[r. t (t)r,(t)|x,] +X)E[r?(*) 

\i<j / iei 

2E f^r,.(t)E[r i (t)|x i ]) + £E[r 2 (t) 



E E [ r ?(*) 



In particular 



Var[G(W )]=E 



1 2g„-l 

7 E r,(t) 

W j = i j odd 

<- E E[t 2 (x,,x m ; 



i 2q n — 1 

i E E[r 2 (t) 

»=1 j odd 



9, 



n i=l, i odd 



1 „ 
< — 7T 



If 



Then v 



7T 



(3) Let (8> ipk)(j,k)eA{m,m') an orthonormal basis of (5 m + S m >, ||.||/). 
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E(sup|G 2 (t)|)<£E(G 2 (<^®^)) 



j,k 



1 



<E— ^ 



' Pn -i 2(2J+l)g„-l 

£ E r?(^-®^) 

Z=0 i=4Z<?„+l,i odd 



^ 16 ^ 



2(2«+l)g»-l 

]T r*(^<g>^) 

i=Alq n +l,i odd 



where we used the independence of the A*. Now we can replace T* by Tj in the 
sum because A t and A\ have the same distribution and we use as previously 
the martingale property of the IV 



IQPn-l 

E(sup|G 2 (t)|)<£-£E 



j,k 71 1=0 



2(2i+l)g„-l 
i=4lq„+l,i odd 



IQPn-l 2(2Z+l)g n -l 

<Er^E E E(it(^®V* 

j,fc ' t (=0 i=4Jg»+l,i odd 

„ 4 2 _D(m, m') 

rrf n 1 n 

J,k 



. , D(m,m') „ D(m,m') 

Then E 2 (sup|G(t)|) < 4||tt ^ and if 2 = 4 tt 1 



According to Lemma [TQl there exists if' > 0, K\ > 0, K' 2 > such that 



E 



sup (AZ^) 2 (t)-10H 2 

teB f (m') 



<K' 



n 



KiD(m,m') + n -4/3^2 e -^n 1 /6 v /I5( 

m,m')/q 



But g n < n c with c < |. So 



E E 



^P ^S(t) 



p(m, m!) 



< 



K' 



n 



E e 



-JC 1 i3(m,m') + n 2c-l/3| A ^ n | e -KX /fl - e 



< 



7 J 



(20) 



In the same way, T,m'eM n E 



sup Z^ 2 r (t) -p(m, m')/4 

teS/(m') 



< for r 
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2,3,4. And then 



E E 

m'eMn 



sup Z^(t) — p(m, m') 

t€B f (m') 



In. 



E E 



sup Z* (t) — p(m, m') 



< 



7. 7 Proof of Proposition 



First we observe that 



p(n c p nn*)<p (sup K(t 2 )\ >i-i/p 



where !/ n (t) = ^E/ [W, J/) - y))]dy and £ = {t e 5 ||t|| f = 1}. 



1 n 

E 

i=i 

But, if = Y.j±aj,k¥j{ x )^k{y), then 



where 

1 n 

*„(«) = -£[«(*;) -ekx?))]. ( 21 ) 

Let bj = (EfcO 2 *;) 1 ^ 2 , then \v n (t 2 )\ < Y l j,j'bjbji\D n ((pj(pj>)\ and, if t E B, 

E i ^ = E i E fc ^*=IKII a </o- 1 . 



Thus 



sup K(t 2 )| < /o 1 sup E^AW^j)!- 



tee 



E*H i,i 



Lemma 11 Let Bji = \\<fij<fii\\oo and Vjj = \\tpjtpi\\2- Let, for any symmetric 
matrix (Aj,z) 

p(A) = sup J2 \ a j a i\ A j,i 
E«H 3,1 

and L(tp) = m&x{p 2 (V), p(B)}. Then, if M2 is satisfied, L{tp) < (j>{D 2 n . 



This lemma is proved in Baraud et al. [26] . 

Let x = ^fc^p ^d A = {VjVZ \v n (<p m )\ < 4 [%s + V^WfW^x] } 
On A: 
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sup |^(t 2 )| <4/ 1 sup Y, b A %£ + TW 2 ll/||ooZ 



<Atf yp{B)x + p(y)p\\f\\ 00 x 
<(1-1/P) 

<(l-l/p) - + ^= <(l-l/p) 



f {l-l/p)p(B) 2_ (p\V)\ 1/2 ' 



1 2 

To + 7H 



Then P sup |i/„(t 2 )| > 1 - - J < P(A C ). But i/ n (u) = 2P n>1 (u) +2v n)2 {u) with 



t6B 



with 



Pn-l 

Pn,r(«) = — 51 ^VC") 7* = 1, 2 



1 



ni(«) = ^e£S£1>w) - e(«W))]. 



2ry 



To bound P(u ni i(ifjipi) > Bjjx + Vj t i^2\\f\\ OQ x) } we will use the Bernstein 
inequality given in Birge and Massart [27]. That is why we bound E|Y/ ^(u)]" 1 : 



Eini(«)r< A(2ii«iioo) m 



E 



< 2 u 



\m-2 



7E 



4g 2 



2(2Z+l)g n 

E [«(X*)-E( U (X*))] 

i=4Zg n +l 
2(2i+l)<? n 

53 [«(*<)- em*,))] 

i=4Z<j„+l 



2(2Z+l) 9n 
i=2/q„+l 



<(2||n|| oo r- 2 i ^E 
since X* = Xj on f2* and the Xj have the same distribution than X x . Thus 



E|F u (n)r < (2||n|| oc ) m - 2 E|n(X 1 ) -E(n(X!))| 2 < (2||«|| ao ) m - 2 y M 2 (x)/(x)cfe 
<2 m - 2 (||u|| 00 )- 2 ( v ^UIIHI) 2 - (22) 



With it = <^,, E\Y ltl (<p j(Pj ,)\ m < 2 m ~ 2 (B j , r ) m - 2 ( 



ooVjji) 2 . And then 



P(IM<W)I > + ^-^211/IUx) < 2e-^ a: . 
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Given that P(tt c p ntt*) < P(A C ) = P (K(<w)l > *(B jtl x + V hl ^2 

PnfZ(l-l/p?\ 



p(n£nfi*)<4p;exp| 

< 4n 2/3 exp 



4011/IUL^) J 
/0 2 (1"1/P) 2 n 



But L(tp) < faV 2 n < fan 2 / 3 and q n < n 1/6 so 

P,n=nn.)<4^ex P {-|yM! n v 6 }<^. (23) 
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Appendix : random penalty 

Here we prove that Theorem [2] is valid with a penalty which does not depend 

on || 7T || oo . 

Theorem 12 We consider the following penalty : 

1 \ tthi " ii D mi D m2 

pen(m) = A K » 

n 

where K is a numerical constant and tt = 7T m * with S m * a space of trigono- 
metric polynomials such that 

In n < D = D < n 1 ^ 

If the restriction of it to A belongs to B^ 3 ^ (A) with a.\ > 3/2 and a 2 > 
max( 2a " 1 _ 3 , 2a"-i ^ th en > under assumptions of Theorem^ for n large enough, 



mWt A -TT\\ 2 n <C inf \d 2 (7Tt A ,S n 
m£M n L 



D mi D m , 2 ~) C 
n J n 



Remark 13 The condition on the regularity of it is verified for example if 
a\ > 2 and a 2 > 2. If ai = a 2 = a, it is equivalent to a > 2. 
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Proof: We recall that ||7r||oo denotes actually ||7t1a||oo and we introduce the 
following set: 

HtHI™ 



A 



IKIaIIoo 

As previously, we decompose the space: 



< 2 



E||7f-7rl A ||2 = E ( ||7f - 7rl A ||^nA)+E (||7f - tt!^ || ^ l^nA- )+E ( ||7r - nt A \\ 2 n t 



We have already dealt with the third term. For the first term, we can proceed 
as in the proof of Theorem [2] as soon as 

9p(m, m!) < pen(m) + pen(m') 

with 9 = 3p = 9/2 and p(m,m') = 10||7r|| oo D(m, m')/n. But on A, H^Hoo < 
2 1| 7T || oo and so 



„ / t \ „n D(m,m') „,, A ,, D(m,m') 
8p(m, m!) = IO^HttHoo — ^ — ^ '- < 209\\n\ l 



I oo 

n n 



<20g||7r|| oo DmiAna + 206)||7r|' Dm[Dm ' 2 



oo 

n n 



It is sufficient to set K = 209. 
Now, inequality ([TBI) gives 

E (\\irt A - 7r™|£WAc) < GNU + 40 2 n 1 / 3 )P(!T n A c ). 



It remains to prove that P(fi* fl A c ) < Cn 4 / 3 for some constant C. 

P(fi;nA c )=P(|||7r|| 00 - llTrl^lUIln* > NU/2) < P(||tt - 7rl A ||oolnj > IMU/2) 
<P(|| ln;>|W|oo/4) + P(|| — tt1a||oo > IKHoo/4) 

<P ( ||7r-7r m *p n * > f^°° I + P(lk OT * - 7rl A |U > IMU/4) 

40 O A/P mi *P m2 * / 



since \\Tf - n m *\\oo < <f) ^D m ^ 1 D 

Furthermore the inequality 7„(vr) < 7 n (7r m *) leads to 

||tT - irt A \\l < hm* - TTl^Hn + 77ll^ - 7Tm*||f + Slip Z^(t) 

C7 t£B f (m*) 

and then, on O p , 
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H 9' 



*\\ 2 f (1-^7) <4p||7r m , -7rt A \\l + 2p9' sup Z 2 (t) 

7 t&B f {m*) 

so ||7r-7r m *|| < — — — ||vr m , -7rl A || n + — — — sup Z re (t) 



t£Bf(m*) 

— ||7T m , -7Tl A || n + — — 

U - Zp U - Zp t £B f (r 

< 12p/ - 1 |A 2 |||7r^ - ttI^H^ + lSp 2 ^ 1 sup Z 2 n (t) 



teBf(m*) 



with 9' = 3p and by remarking that for t with support A, < \A 2 
Thus 



2^1/1 IIUM2 
00 • 



aIIL > ^J? r, ,0 ,-1, : r ) 

12 



< P( sup Z 2 (t)l n * > — ) + P(D mi *D m2 *\\7r m * - ttIaWI > b) 



32<p 2 D mi ,D m2 ,12pfo 1 \A 

P(\\ - ttIaIIoo > lklloo/4) 

a 

'teB f (m*) "' ' Tl 

+ P(|| 7 r m ,- 7 rl A || 00 >M^) 

(24) 

Ngc 1 , , ML 1 

with a = -75 - i and = -75 i— , . 

320§ 18p2/ -i 320 2 i2 p /-i|A 2 | 

We will first study the two last terms in (1241) . Since the restriction tia of 
7r belongs to B2^ a2 \A), the imbedding theorem proved in Nikol'skii [24] 
p. 236 implies that tta belongs to B^£ 2 \A) with (3i = — 1/a) and 

/3 2 = 0:2(1 — 1/a). Then the approximation lemma [9] (which is still valid for 
the trigonometric polynomial spaces with the infinite norm instead of the L 2 
norm) yields to 

\\*n»-*n\\oo<C{D& + D&) 



And then, since = D r 



D mi *D m J\n m * - ttIaWI < C\D 2 ~ 2 ^ + D 2 ~ 2 ^) 

< C'((]nn) 2 - 2 ^ + (Inn) 2 " 2 *) 

(2 - 2/3i < ^ 2«ia 2 - 3a 2 - «i > 
Indeed < and this double condition is 

I 2 - 2/3 2 < <^ 2«i« 2 - 3a 1 - a 2 > 

ensured when at\ > 3/2 and a 2 > max( 2a " 1 _ 3 , 2 ^ a i 1 ). Consequently, for n large 
enough, 

I I 7T I I 

P{D miif D m2if \\'K rm - vIaWIq >b) + P(||tt„ w - vrJIoo > — = 0. 
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We will now prove that 



P sup 2%(t)l a . > -TTT < 



C 



n l/3 J - n 4/3 



and then using (1241) . we will have P(Q* PI A c ) < Cn 4//3 . We remark that, if 

(<Pj <8> ^ , /t)j,fe is a base of (£„ 



"m* i || • || t } i 



sup z^)<E^®^) 

teB f (m*) j jk 

and we recall that, on fT, Z n (t) = Z) r=1 Z* r (t) (see the proof of Proposition 
EJ. So we are interested in 



4D D r) 1 / 3 / ' 



Let x = Bn~ 2/3 with 5 such that 2/ 2 73 2 + 41171-lloo.B < a/4 (for example 
B = inf (1, a/8(/ ~ 2 + 2||7r||oo))- Then " 



2 1| 7T || oo^ + y Dmi*D m2 *f x) < 



AD D n 1 / 3 ' 



So we will now bound P(Z* A (ifj®ip k )ta* > y / 2\\Tr\\ 00 x + D mi *D m2 *f 1 x) by 

using the Bernstein in 
for all integer m > 2, 



using the Bernstein inequality given in [27]. That is why we bound E|^- X^i^odd 



^ i=i,i odd (4g n ) m 



2g„-l 

E 

i=l,i odd 



m-2 



< 



< 



2 ; iee 

.,1 \ m-2 



t{Xt,y)iz{Xly)dy] 



2g„-l 

E 

i=l,z odd 



/ t 2 (x,y)f(x)Tr(x,y)dxdy 



< 



2m+2 



\m-2i 



7T 



I/- 



Then 



2?n-l 



e|t- E r fa (^®^)r<— i 

" n i=l,t odd 



^mi*^m2* Jo / II 
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Thus the Bernstein inequality gives 



Hence 



P{ sup Z%(t)t Q * > 



a 



) < 2D m ^D m ^ exp{-p n Bn 2/3 } 



4n l/3 



< 2n 2/3 exp{ 



Bn i/3 
4 q n 



}■ 



But 2ra 2 / 3 exp{ 



Bn 1 / 3 

4 ?n 



} < Cn 4//3 since g n < n 1//6 and so 




\teS/(m*) 



/i 



1/3 



) 



< 



n 4/3- 



AC 
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